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Abstract 

In many important design problems, some decisions should be made 
by finding the global optimum of a multiextremal objective function 
subject to a set of constrains. Frequently, especially in engineering ap¬ 
plications, the functions involved in optimization process are black-box 
with unknown analytical representations and hard to evaluate. Such 
computationally challenging decision-making problems often cannot be 
solved by traditional optimization techniques based on strong supposi¬ 
tions about the problem (convexity, differentiability, etc.). Nature and 
evolutionary inspired metaheuristics are also not always successful in 
hnding global solutions to these problems due to their multiextremal 
character. In this paper, some innovative and powerful deterministic 
approaches developed by the authors to construct numerical methods 
for solving the mentioned problems are surveyed. Their efficiency is 
shown on solving both the classes of random test problems and some 
practical engineering tasks. 

Key Words. Global optimization, black-box functions, derivative-free meth¬ 
ods, Lipschitz condition, applied problems. 

1 Introduction 

Numerical approaches to efficiently find optimal parameters of mathemat¬ 
ical models arising from different real-life design problems are nowadays 
becoming more and more significant in industrial processes. Optimization 
models characterized by the functions with several local optima (typically. 


‘email: kvadim@dimes.unical.it 
tCorresponding author. 

* email: yaro@dimes.unical.it 


1 



their number is unknown and can be very high) have a particular impor¬ 
tance for practical applications. When the best set of parameters should be 
determined for these multiextremal models, traditional local optimization 
techniques, including many heuristic approaches, can be insufficient and, 
therefore, global optimization methods are used. Moreover, the objective 
functions and constraints to be examined are often black-box and hard to 
evaluate functions with unknown analytical representations. For example, 
their values can be obtained by executing some computationally expensive 
simulation, by performing a set of experiments, and so on. Such a kind 
of functions is frequently met in various fields of human activity (as, e.g., 
automatics and robotics, structural optimization, safety verification prob¬ 
lems, engineering design, network and transportation problems, mechani¬ 
cal design, chemistry and molecular biology, economics and finance, data 
classification, etc.) and corresponds to computationally challenging global 
optimization problems, being actively studied around the world (see, e.g., 
[3, 17, 25, 28, 52, 53, 55, 56, 59, 72, 73, 80, 85] and the references given 
therein). 

This paper is based upon the work [43] presented at the Eleventh Inter¬ 
national Conference on Computational Structures Technology (Dubrovnik, 
Croatia, 4-7 September 2012) and extends the previous research in both 
the theoretical and the experimental directions, thus offering to the optimal 
design community competitive tools to tackle real-life engineering decision¬ 
making tasks. 

The paper is structured as follows. In the next Section, an insight into 
black-box global optimization problems is given and some approaches for 
their solution are briefly discussed. One of these approaches are based on a 
quite natural (from the physical viewpoint) assumption that the objective 
function and constraints have bounded slopes, i.e., they satisfy the Lipschitz 
condition. Such a problem statement is formalized and examined more in 
detail in Section 3. Some approaches proposed by the authors to construct 
efficient numerical methods for solving the mentioned problems are briefly 
presented in Section 4. Section 5 illustrates the theoretical considerations of 
the paper with some numerical experiments. Finally, conclusions and future 
research directions are drawn. 

2 Black-box global optimization 

To illustrate real-world global optimization problems we deal with, let us con¬ 
sider the field of geomechanics and geophysics where one has to work with 
different mechanical-mathematical optimization models. Generally, these 
models are very complex since they can involve multidimensional linear or 
nonlinear partial differential equations with multiple contact boundaries, 
regions with sharp changes in functions values, ill-posedness, and so on. 
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Knowledge of the properties and types of geological rocks lying at a depth of 
several kilometers is of great interest, e.g., for prospecting seismology, which 
determines the location of oil fields by means of acoustic waves. Prospect¬ 
ing seismology techniques allow one both to avoid the costly exploration 
methods (e.g., well drilling) and to accelerate the process of pinpointing oil 
resources. Among these techniques, numerical methods for solving inverse 
problems are of fundamental importance in prospecting seismology. They 
aim at estimating parameters of the Earth’s structure and material prop¬ 
erties (e.g., location of the inhomogeneities as cracks/cavities in the crust) 
based on data measured on the surface. 

To give a concrete example of the resulting global optimization problem, 
a simplified version of the prospecting seismology inverse problem can be 
taken: namely, let us suppose that there is a fluid-filled crack (or a number 
of such cracks) of a given length located in the host rock with known elastic 
properties (see, e.g., [21, 38]). Then, the vector x of unknown parameters 
defining the region geometry contains only two components: the depth of 
the crack occurrence h, hi < h < h 2 , and the crack inclination angle a, 
ai < a < a 2 (hi, ^ 2 , ai, and 0:2 are known constants). 

One of the peculiarities of the stated problem is that information can be 
obtained only from experimental measurements with the usage of acoustic 
sounding (see, e.g., [49]). A number of seismic detectors are located at 
points di on the surface of the Earth, which record the vertical components 
Vy{di,tj) of particles velocity in the reflected wave at time instances tj. We 
look for such a value of x that best fits the numerically simulated response 
Vy{x, di,tj) to a measured one. Computational simulation can be performed 
by some numerical integration algorithm: for example, the grid-characteristic 
method (see, e.g., [58, 46]) can be used for this scope, thus taking into account 
the physical features of the problem and allowing one to set correctly the 
boundary and contact conditions. 

Hereby, this particular problem can be formulated as the following least 
squares optimization problem (see, e.g., [80, 85, 51, 62, 64, 77]): 

min/(x), X e D = [hi,h2] X [ai,a2], (1) 

f(x) = '^Yl^Vy{x,di,tj) - Vy{di,tj)f. (2) 

* j 

Function (2) is essentially multiextremal, it has no analytical representa¬ 
tion and its evaluation (sometimes called trial) is associated with performing 
computationally expensive numerical experiments. Therefore, the usage of 
fast and robust global optimization methods aimed at tackling this class 
of complex multiextremal problems is required for solving efficiently prob¬ 
lem (l)-(2). 
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Because of the computational costs involved, typically a small number 
of functions evaluations are available for a decision-maker (engineer, physi¬ 
cist, chemist, economist, etc.) when optimizing such costly functions. Thus, 
the main goal is to develop fast global optimization methods that produce 
acceptable solutions with a limited number of functions evaluations. But to 
obtain this goal, there are still a lot of difficulties that are mainly related ei¬ 
ther to the lack of information about the objective function (and constraints, 
if any) or to the impossibility to adequately represent the available informa¬ 
tion about the functions. 

For example, gradient-based algorithms (see, e.g., [17, 28, 55]) cannot 
be used in many applications because black-box functions are often non- 
differentiable or derivatives are not available and their finite-difference ap¬ 
proximations are too expensive to obtain. Automatic differentiation (see, 
e.g., [7]), as well as interval-based approaches (see, e.g., [8, 33]), cannot be 
appropriately used in cases of black-box functions when their source codes 
are not available. A simple alternative could be the usage of the so-called 
direct (or derivative-free) search methods (see, e.g., [34, 6, 35, 50, 9, 61]), 
frequently used now for solving engineering design problems (see, e.g., the 
DIRECT method [17, 34, 30], the response surface, or surrogate model meth¬ 
ods [31, 60], pattern search methods [81, 10, 1, 2], etc.). But unfortunately 
(see, e.g., [71, 11, 44, 57]), these methods either are designed to find only 
stationary points or can require too high computational effort for their work. 

Therefore, solving the described global optimization problems is actually 
a real challenge both for the theoretical and the applied scientists. In this 
context, deterministic global optimization is a well developed mathematical 
theory which has many important applications (see, e.g., [17, 28, 59, 72, 
80, 16]). One of its main advantages is the possibility to obtain guaranteed 
estimations of global solutions and to demonstrate (under certain analytical 
conditions) rigorous global convergence properties. However, the currently 
available deterministic models can still require too large number of functions 
evaluations to obtain adequately good solutions for these problems. 

Stochastic approaches (see, e.g., [17, 28, 53, 55, 85, 84]) can often deal 
with the stated problems in a simpler manner than the deterministic algo¬ 
rithms (being also suitable for the problems where the evaluations of the 
functions are corrupted by noise). However, there can be difficulties with 
these methods, as well (e.g., in studying their convergence properties). Sev¬ 
eral restarts can also be involved, requiring even more functions evaluations. 
Moreover, solutions found by many stochastic algorithms (especially, by pop¬ 
ular heuristic methods like evolutionary algorithms, simulated annealing, 
etc.; see, e.g., [52, 55, 27, 63, 83, 26, 32, 82]) can be only local solutions to 
the problems, far from the global ones. This can preclude such methods from 
their usage in practice, especially when an accurate estimate of the global 
solution is required. That is why we concentrate, hereafter, on deterministic 
approaches. 


4 



The possibility to outperform the ‘brute-force computation’ techniques 
in solving global optimization problems is fundamentally based on the avail¬ 
ability of some realistic a priori assumptions characterizing the objective 
function and eventual constraints (see, e.g., [28, 55, 59, 72, 80, 85]). They 
serve as mathematical tools for obtaining estimates of the global solution 
related to a finite number of function trials and, therefore, play a crucial role 
in the construction of any efficient global search algorithm. As observed, e.g., 
in [28, 78], if no particular assumptions are made on the objective function 
and constraints, any finite number of function evaluations cannot guarantee 
getting close to the global minimum value, since this function may have very 
high and narrow peaks. 

One of the natural and powerful (from both the theoretical and the ap¬ 
plied points of view) assumptions on the global optimization problem is that 
the objective function (and constraints) has (have) bounded slopes. In other 
words, any limited change in the object parameters yields limited changes 
in the characteristics of the objective performance. This assumption can be 
justified by the fact that in technical systems the energy of change is always 
limited (see the related discussion in [80]). One of the most popular math¬ 
ematical formulations of this property is the Lipschitz continuity condition, 
which assumes that the difference (in the sense of a chosen norm) of any two 
function values is majorized by the difference of the corresponding function 
arguments, multiplied by a positive factor L < oo. In this case, the func¬ 
tion is said to be Lipschitz and the corresponding factor L is said to be the 
Lipschitz constant. The problem involving Lipschitz functions (the objec¬ 
tive function and constraints) is said to be the Lipschitz global optimization 
problem (see, e.g., [28, 59, 72, 73, 80, 85, 45, 14] and the references given 
therein). 

The Lipschitz continuity assumption, being quite realistic for many prac¬ 
tical black-box problems, is also an effective tool for obtaining accurate global 
optimum estimates after performing a limited number of functions evalua¬ 
tions. It is used by the authors to develop efficient and reliable deterministic 
methods for solving multidimensional constrained global optimization prob¬ 
lems from different real-life applied areas (as, e.g., the problem (l)-(2)), 
which are characterized by black-box multiextremal and hard to evaluate 
functions. In the next Section, the Lipschitz global optimization is exam¬ 
ined more in detail. 

3 Lipschitz global optimization problem 

A general Lipschitz global optimization problem can be formalized as follows 
(see, e.g., [59, 72, 80, 85, 14]): 

f* = f{x*) = X £ Q C (3) 
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where ri is a bounded set defined as 

= {x G D : Ci(x) < 0, 1 < i < p}, (4) 

D = [a,h] = {x : a{j) <x{j) <b{j),l < j <N}, a,6GM^, (5) 

with N being the problem dimension. In (3)-(5), the objective function /(x) 
and the constraints Q{x)^ 1 < i < p, are multiextremal, non necessarily dif¬ 
ferentiable, black-box and hard to evaluate functions that satisfy the Lips- 
chitz condition over the search hyperinterval D: 

\f{x')-f{x")\<L\\x'-x"l x',x”eD, (6) 

\Ci{x') - Ci{x'')\ < Li\\x'- x''\\, x’,x"gD, l<i<p, (7) 

where || • || denotes, usually, the Euclidean norm, L and Li, 1 < i < p, are 
the (unknown) Lipschitz constants such that 0 < L < oo, 0 < Li < oo, 
1 < i < p. If p = 0 in (4), the problem is said to be box-constrained. 

The admissible region can consist of disjoint, non-convex subregions 
because of the multiextremality of the constraints Moreover, these 

constraints can be partially defined, i.e., a constraint ((j_|_i(x) (or the objective 
function /(x)) can be defined only over subregions where < 0, 1 < i < p 
(see, e.g., [73, 80] for more details and applied examples). 

Problem (3), (5), (6) with a differentiable objective function having the 
Lipschitz (with an unknown Lipschitz constant) gradient f'{x) (which could 
be itself a multiextremal black-box function) is sometimes included in the 
same class of Lipschitz global optimization problems (see, e.g., the references 
given in [80, 44, 42]). 

As evidenced, e.g., in [73, 80], it is not easy to manage multiextremal con¬ 
straints (4) within the context of Lipschitz global optimization. For example, 
the traditional penalty approach (see, e.g., the references in [17, 28, 50]) can 
lead to extremely high Lipschitz constants, thus forcing degeneration of the 
methods. In this connection, a promising approach called the index scheme 
(see, e.g., [80, 4, 70, 74]) can be applied. It does not introduce additional 
variables and/or parameters by opposition as, e.g., many traditional penalty 
approaches do, and reduces the general constrained problem (3)-(7) to a 
box-constrained discontinuous one. 

Therefore, in order to give an insight into the principal ideas of the 
authors’ techniques for solving the stated problem, box-constrained Lipschitz 
global optimization problem (3), (5), (6) will be considered in the following. 

Once a valid estimate of the Lipschitz constant is known and some func¬ 
tion trials are performed, the Lipschitz condition (6) allows us to easily 
find the lower bounds of a Lipschitz function at different subregions of the 
search domain D from (5). Let us consider, for the sake of example, a one¬ 
dimensional objective function /(x) defined over an interval [a,b] (see Fig¬ 
ure 1) that satisfies the Lipschitz condition (6) with a known Lipschitz con¬ 
stant L. If the function values Zi have been obtained at points x^, 0 < i < A: 
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Figure 1: Lower bounding function Fk{x) (dashed line) constructed for a 
Lipschitz function f(x) (solid line) over [a,b] after having performed k + 1 
function trials (in this case, k = 6). 


(see black dots on the objective function graph in Figure 1), the following 
inequality is satisfied over [a, 6]: 

fix) > Fkix) = max {zi - L\x - Xi\}, (8) 

0<i<k 

where Fk{x) is a piecewise linear function (called lower bounding or mino- 
rant function, see, e.g., [73, 80, 45]; its graph is drawn by a dashed line in 
Figure 1). 

A method (e.g., the Piyavskij-Shubert method being one of the first 
methods in Lipschitz global optimization, see [28, 73, 80, 45, 13]), using in 
its work this simple but efficient geometric interpretation, iteratively con¬ 
structs an auxiliary function which bounds the objective function f{x) from 
below and evaluates f{x) at a point (xt in Figure 1) corresponding to a 
minimum of the bounding function. This point is easy to find (see, e.g., 
[28, 73, 80, 45]). The methods of this type form the class of geometric al¬ 
gorithms that are based on constructing, updating, and improving auxiliary 
piecewise functions built by using an estimate of the Lipschitz constant L. It 
should be noted in this connection, that similar ideas are used in many other 
surrogate-based optimization methods (see, e.g., [76, 5, 29, 18]). As shown, 
e.g., in [72, 80], there exists a strong relationship between the geometric ap¬ 
proach and another possible technique for solving the stated problem—the 
so-called information-statistical approach (see, e.g., [80, 79] and also [53, 85] 
for other probabilistic techniques). Together with the geometric ideas of the 
Piyavskij-Shubert method, it has consolidated foundations of the Lipschitz 
global optimization. 

In order to develop Lipschitz global optimization methods, the Lipschitz 
constant L from (6) should be estimated. It can be done in several ways. For 
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example, the Lipschitz constant can be given a priori (see, e.g., [28, 13, 15]). 
More practical approaches are based on an adaptive estimation of L in the 
course of the search: such algorithms can use either an adaptive global esti¬ 
mate of the Lipschitz constant (see, e.g., [59, 80, 79, 41]) valid for the whole 
search domain 74, or adaptive local estimates Lj valid only for some sub- 
regions Di d D (see, e.g., [72, 80, 65, 66, 40]). Estimating local Lipschitz 
constants during the work of a global optimization algorithm allows one to 
significantly accelerate the global search. Balancing between local and global 
information must be performed in an appropriate way (see, e.g., [72, 80, 65]) 
since an unjustified usage of local information can lead to the loss of the 
global solution (see, e.g., [78]). Finally, multiple estimates of L can be also 
used (see, e.g., [72, 34, 30, 71, 19]). We would like to emphasize here that 
either the Lipschitz constant is given and an algorithm is developed corre¬ 
spondingly, or it is not known but there exist a sufficiently large number 
of parameters of the considered algorithm ensuring its convergence (conver¬ 
gence properties of the Lipschitz global optimization methods are thoroughly 
examined, e.g., in [72, 80, 79, 67]). 

Considering both the theoretical generality and the application diffusion 
of the Lipschitz global optimization problem (3), (5), (6), it is used by the 
authors to mathematically model various real-life optimal design problems 
(see [72, 80, 21, 39, 38, 69]). 

4 Some deterministic tools in Lipschitz global op¬ 
timization 

In this Section, some innovative deterministic approaches developed by the 
authors for constructing efficient global optimization techniques are briefly 
presented as in [43]. The consolidated success of these ideas, confirmed by 
important international publications and presentations around the world, 
allows the authors’ group, on the one hand, to develop promising optimiza¬ 
tion approaches over a solid scientific basis, thus eliminating the theoretical 
faults risks, and, on the other hand, to tackle difficult black-box practical 
optimization problems (e.g., from control theory, environmental sciences and 
geological mechanics, electrical engineering and telecommunications, gravi¬ 
tational physics, etc.) with more efficiency with respect to the techniques 
traditionally used by the optimal design engineers. 

4.1 ‘Divide-the-Best’ algorithms 

Many global optimization algorithms (of both deterministic and stochastic 
types) have a similar structure. Therefore, several attempts aiming to con¬ 
struct a general framework for describing computational schemes and pro¬ 
viding their convergence conditions in a unified manner have been made (see. 




Figure 2: Flow chart of ‘Divide-the-Best’ scheme. 


e.g., [17, 28, 59, 24]). One of the more flexible and robust among such unify¬ 
ing schemes is the ‘Divide-the-Best’ approach (see [72, 67]), which general¬ 
izes both the schemes of adaptive partition [59] and characteristic [72, 80, 24] 
algorithms, widely used for describing and studying numerical global opti¬ 
mization methods. 

In this scheme (the flow chart of its generic iteration is reported in Fig¬ 
ure 2), given a vector p of the method parameters, an adaptive partition of 
the admissible region D from (5) into a collection {D^'\ of the finite number 
of robust subsets is considered at each iteration k. The ‘merit’ (called 
characteristic) Ri of each subset (see Step 2 in Figure 2) for performing a 
subsequent, more detailed, investigation (see Steps 3 and 4 in Figure 2) is 
estimated on the basis of the obtained information X^, about the objec¬ 
tive function. The best (in some predefined sense) characteristic obtained 
over some hyperinterval corresponds to a higher possibility to find the 
global minimizer within (see Step 3). This hyperinterval is subdivided 
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at the next iteration of the algorithm. Naturally, more than one ‘promising’ 
hyperinterval can be partitioned at every iteration. 

Several strategies (mainly, in the context of the geometric approach) for 
selection of a subset for further partitioning (see Step 3 in Figure 2) and 
for performing this partitioning (by means of an operator P, see Step 4 in 
Figure 2 and the next subsection 4.2) are proposed by the authors from 
a general viewpoint and successfully used for solving practical applications 
(see, e.g., the references in [72, 73]). 

Regarding the stopping criteria, one can constantly check, e.g., the vol¬ 
ume of a hyperinterval with the best characteristic or depletion of computing 
resources such as the maximum number of trials. The verification of a stop¬ 
ping criterion can be performed at any step of the current iteration of the 
algorithm. 

Convergence properties of the ‘Divide-the-Best’ family for different types 
of characteristic values and partition operators are studied in [72, 67]. Great 
attention is given to situations (very important in practice) when conditions 
of global (local) convergence are satisfied not in the whole search domain D, 
but only in its small subregion (or a set of subregions). This can corre¬ 
spond, for example, to Lipschitz global optimization algorithms that work 
underestimating the Lipschitz constant or which are oriented on using local 
information in subregions of D (see, e.g., [72, 80, 67]). It should be also 
noted that the described scheme can be successfully applied to constructing 
parallel multidimensional global optimization algorithms [80, 24]. 

4.2 Efficient partitioning strategy 

Regarding the partitioning strategies (partitioning operator P on Step 4 
in Figure 2), the main attention of the authors is focused on the diagonal 
partition strategies (see the references in [59, 72, 73, 71]). 

In this approach, the initial hyperinterval D from (5) is partitioned into a 
set of smaller hyperintervals, the objective function is evaluated only at two 
vertices corresponding to the main diagonal of hyperintervals of the current 
partition of D (see, e.g., points ai and of a hyperinterval Di in Figure 3), 
and the results of these evaluations are used to select a hyperinterval for the 
further subdivision. The diagonal approach has a number of attractive theo¬ 
retical properties and has proved to be efficient in solving applied problems. 

First, it allows one to easily perform an extension of efficient univari¬ 
ate global optimization algorithms to the multidimensional case (see, e.g., 
[72, 73, 71]). In fact, in order to calculate the characteristic Ri of a multidi¬ 
mensional subregion Dj, some one-dimensional characteristics can be used as 
prototypes. After an appropriate transformation they can be applied to the 
one-dimensional segment being the main diagonal [a^, 6*] of the hyperinterval 
Di (see Lipschitz-based lower bounding functions P[i and H 2 in Figure 3). 

Second, the diagonal approach is close from the computational point 
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Figure 3: Lower bounding a Lipschitz function f{x) over a hyperinterval Di 
in a diagonal algorithm. 


of view to one of the simplest strategies—centre-sampling technique (see, 
e.g., [6, 30, 14, 15, 19])—but at the same time, the objective function is eval¬ 
uated at two points of each subregion, providing in this way more information 
about the function over the subregion than centre-sampling methods. 

Different exploration techniques based on various diagonal adaptive par¬ 
tition strategies are analyzed, e.g., in [72, 73, 68]. It is demonstrated that 
partition strategies traditionally used in the framework of the diagonal ap¬ 
proach do not fulfil the requirements of computational efficiency because 
of the execution of many redundant trials. Such a redundancy slows down 
significantly the global search in the case of costly functions. 

An efficient diagonal partition strategy is therefore proposed in [72, 68], 
that allows one to avoid the computational redundancy of traditional diag¬ 
onal schemes. In contrast to these schemes, the proposed strategy produces 
regular meshes of the function evaluation points in such a way that one ver¬ 
tex where f{x) is evaluated can belong to several hyperintervals (up to 2^, 
N is the problem dimension from (5)). Thus, the time-consuming procedure 
of the function evaluations is replaced by a significantly faster operation of 
reading (up to 2^ times) the function values obtained at the previous it¬ 
erations and saved in a special database (see, e.g., [37, 36]). Hence, this 
partition strategy considerably speeds up the search and also leads to saving 
computer memory. It is particularly important that these advantages be¬ 
come more pronounced when the problem dimension N increases (see, e.g., 
[72, 71, 41]). 

A novel scheme for creating fast Lipschitz global optimization algorithms 
is, thus, introduced by the authors. It relies on the efficient diagonal par¬ 
tition strategy allowing an efficient extension of popular one-dimensional 
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Lipschitz global optimization algorithms to the multidimensional case. In a 
sense, this scheme combines the ideas of the diagonal approach and Peano 
space-filling curves (see, e.g., [80, 79, 75]). Innovative multidimensional di¬ 
agonal algorithms for solving Lipschitz global optimization problems, based 
on different ways for obtaining the Lipschitz information and developed in 
the framework of the efficient diagonal scheme, are proposed by the authors 
and their convergence properties are analyzed, e.g., in [72, 71, 41]. 

4.3 Balancing local and global information 

Is well known (see, e.g., [17, 28, 72, 80, 78]) that the usage of the only global 
information on the objective function and constraints during optimization 
can lead to a slow convergence of algorithms to global minimizers. Therefore, 
particular attention is paid by the authors to the usage of local information 
in global optimization methods, as well. One of the traditional ways in this 
context (see, e.g., [17, 28, 55]) recommends stopping the global procedure 
and switching to a local optimization method in order to improve the solution 
and to accelerate the search during its final phase. Unfortunately, applying 
this technique can lead to some problems related to the combination of global 
and local phases, the main problem being that of determining when to stop 
the global procedure and start the local one. A premature arrest can provoke 
the loss of the global solution whereas a late one can slow down the search. 

Theoretical and experimental results obtained by the authors (see, e.g., 
[72, 80, 65, 40]) confirm that more fruitful approaches can be considered. 
The first one is the so-called local tuning approach [65] allowing global op¬ 
timization algorithms to tune their behaviour to the shape of the functions 
at different parts of the search domain by estimating the local Lipschitz 
constants. 

In fact, the Lipschitz constant L has a significant influence on the conver¬ 
gence speed of the Lipschitz global optimization algorithms and the problem 
of its specifying is of great importance. Accepting, for instance, too high a 
value of L for a concrete objective function means assuming that the function 
has complicated structure with sharp peaks and narrow attraction regions of 
minimizers within the whole admissible region. Thus, if the value of L does 
not correspond to the real behaviour of the objective function, it can lead to a 
slow convergence of the algorithm to the global minimizer. Global optimiza¬ 
tion algorithms using in their work a global estimate of L (or some values of 
L given a priori) do not take into account local information about behaviour 
of the objective function over every small subregion of D. Therefore, es¬ 
timating local Lipschitz constants allows one to significantly accelerate the 
global search (see, e.g., [72, 80, 66, 40]). 

The second technique regards a continual local improvement of the cur¬ 
rent best solution incorporated in a global search procedure (see, e.g., [72, 
71, 47, 48]). Particularly, it forces the global optimization method to make 
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a local improvement of the best approximation of the global minimum im¬ 
mediately after a new approximation better than the current one is found. 
These techniques become even more efficient when information about the 
objective function derivatives is available (see, e.g., [44, 42]). 

4.4 Computational aspects 

A particular attention is paid by the authors to the problem of testing global 
optimization algorithms. As widely accepted, a set of test functions is usually 
taken for this purpose, problems from this set are solved by the algorithms 
to be compared, and a conclusion about the efficiency of these algorithms is 
made on the basis of the obtained numerical results. This approach, being an 
important instrument for acquiring a knowledge about the existing and new 
global optimization algorithms, presents at the same time some limitations 
since the conclusions made can be valid only for the selected functions, and 
their propagation to a more wide set of functions requires particular caution. 
Testing an algorithm on a relatively large set of test functions can, in a sense, 
diminish these limitations, but it needs, among other things, the coding of the 
functions, and it is a tedious and time-consuming job. Moreover, the lack 
of such information as number of local optima, their locations, attraction 
regions, local and global values, describing global optimization tests taken 
from real-life applications, creates additional difficulties in verifying validity 
of the algorithms. Therefore, the global optimizers are very interested in 
simple and powerful software tools realizing test problems. As observed, 
e.g., in [72, 80, 85, 61, 23, 12, 54], a well designed testing framework is 
of the primary importance in identifying the merits of each algorithm and 
implementation. 

To tackle the problem of testing global optimization algorithms system¬ 
atically, the GKLS-generator described in [20] is proposed by the authors’ 
group. The generator produces several classes of multidimensional and mul¬ 
tiextremal test functions with known local and global minima. Each test 
class provided by the generator includes 100 functions. By changing the 
user-defined parameters, classes with different properties can be created. 
For example, fixed dimension of the functions and number of local minima, 
a more difficult class can be created either by shrinking the attraction region 
of the global minimizer, or by moving the global minimizer closer to the 
domain boundary. 

The generator is available on the ACM Collected Algorithms (CALCO) 
database (the CALCO is part of a family of publications produced by the 
Association for Computing Machinery) and it is also downloadable for free 
from http: \\wwwinf o. dimes. unical. it\~yaro\GKLS. html. It has already 
been downloaded by companies and research organizations from more than 
40 countries of the world. 
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5 Some numerical results 

To conclude, we would like to report some numerical results obtained by 
using a Lipschitz global optimization method proposed by the authors in [71]. 
In developing this method for solving problem (3), (5), (6), techniques from 
the previous Section have been applied. Particularly, it is a multidimensional 
‘Divide-the-Best’ global optimization method that uses in its work multiple 
estimates of the Lipschitz constant and based on efficient diagonal partitions. 

Numerical results performed on the GKLS-generator to compare this al¬ 
gorithm with two algorithms belonging to the same class of methods for 
solving problem (3), (5), (6) — the DIRECT algorithm from [30] and its 
locally-biased modification DIRECT/ from [19] — are presented here, as 
described in [71]. As known, both of these methods are widely used in solv¬ 
ing practical engineering problems (see, e.g., the references in [17, 34, 71]). 
Moreover, as shown numerically in [22], they often outperform metaheuristic 
algorithms as, e.g., the Eirefly algorithm (see [83]) belonging to the widely 
used family of Particle Swarm Optimization algorithms (see, e.g., [83, 82]). 

Eight CKLS classes of continuously differentiable test functions of di¬ 
mensions Ai = 2, 3, 4, and 5 have been used. For each dimension, both a 
‘hard’ and a ‘simple’ classes have been considered. The difficulty of a class 
was increased either by decreasing the radius of the attraction region of the 
global minimizer, or by decreasing the distance from the global minimizer 
X* to the domain boundaries. 

The global minimizer x* £ D was considered to be found when the 
algorithm generated a trial point x' inside a hypercube with a vertex x* 
and the volume smaller than the volume of the initial hypercube D = [a, b] 
multiplied by an accuracy coefficient A, 0 < A < 1, i.e., 

\x'ij)-x*{j)\ < VAm-aU)) (9) 

for all i, 1 < j < N, where N is from (5). The algorithm stopped either 
when the maximal number of trials equal to 1 000 000 was reached, or when 
condition (9) was satisfied. 

In view of the high computational complexity of each trial of the objective 
function, the methods were compared in terms of the number of evaluations 
of f{x) required to satisfy condition (9). The number of hyperintervals 
generated until condition (9) is satisfied, was taken as the second criterion 
for comparison of the methods. This number reffects indirectly degree of 
qualitative examination of D during the search for a global minimum (see, 
e.g., [72, 71, 41]). 

Results of numerical experiments with eight CKLS tests classes are re¬ 
ported in Tables 1~2. These tables show, respectively, the maximal number 
of trials and the corresponding number of generated hyperintervals required 
for satisfying condition (9) for a half of the functions of a particular class 
(columns “50%”) and for all 100 function of the class (columns “100%”). The 
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Table 1: Number of trial points for 800 GKLS test functions. 


N 

A 

Class 

50% 

100% 

DIRECT 

DIRECT/ 

New 

DIRECT 

DIRECT/ 

New 

2 

■OBB 

simple 


152 

166 

1159 


403 

2 


hard 


1328 

613 

3201 


1809 

3 

■nsa 

simple 

386 

591 

615 

12507 

13309 


3 


hard 

1749 

1967 

1743 

>1000000 (4) 

29233 

Hi 

4 

■nHB 

simple 


7194 


>1000000 (4) 

118744 

14520 

4 


hard 


33147 


>1000000 (7) 

287857 

42649 

5 

10-' 

simple 

1660 

9246 


>1000000 (1) 

178217 

33533 

5 

10-7 

hard 

55092 

126304 


>1000000 (16) 

>1000000 (4) 

93745 


Table 2: Number of hyperintervals for 800 GKLS test functions. 


N 

A 

Class 

1 50% 

1 100% 1 

DIRECT 

DIRECT/ 

New 

DIRECT 

DIRECT/ 

New 

2 

2 

10-^ 

10-^ 

simple 

hard 

111 

1062 

152 

1328 

269 

1075 

1159 

3201 

2318 

3414 

685 

3307 

3 

3 

lO-" 

10-® 

simple 

hard 

386 

1749 

591 

1967 

1545 

5005 

12507 

>1000000 

13309 

29233 

6815 

17555 

4 

4 

lO-" 

10-® 

simple 

hard 

4805 

16114 

7194 

33147 

15145 

68111 

>1000000 

>1000000 

118744 

287857 

73037 

211973 

5 

5 

10-' 
10-7 

simple 

hard 

1660 

55092 

9246 

126304 

21377 

177927 

>1000000 

>1000000 

178217 

>1000000 

206323 

735945 


notation “> 1 000 000 (A:)” means that after 1 000 000 trials the method 
under consideration was not able to solve k problems. 

Note that on a half of test functions from each class (which were simple 
for each method with respect to the other functions of the class) the algo¬ 
rithm from [71] manifested a good performance with respect to DIREGT 
and DIREGT / in terms of the number of generated trial points (see Table 1). 
When all functions were taken in consideration, the number of trials pro¬ 
duced by the new algorithm was significantly fewer in comparison with two 
other methods (see columns “100%” of Table 1), providing at the same time 
a good examination of the admissible region (see Table 2). 

As it can be seen from Tables 1-2, the method [71] demonstrates a 
quite satisfactory performance with respect to popular DIREGT [30] and 
DIREGT/ [19] methods when multidimensional functions with a really com¬ 
plex structure are minimized. Its superiority can be also confirmed by the 
so-called operating characteristics (introduced in 1978 in [23], see [80] for 
their English language description; they can be considered as predecessors of 
‘performance profiles’ from [12] and ‘data profiles’ from [54]). The operating 
characteristics, as an indicator of the efficiency of an optimization method, 
are formed by the pairs {k,P{k)) where k {k > 0) is the number of trials 
and P{k) (0 < P{k) < M) is the number of test problems (among a set of 
M tests) solved by the method with less than or equal to k function trials. 
It is convenient to represent this indicator in a graph where each pair (for 
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^^New, Simple class 

^^New, Hard class 

“^“DIRECT, Simple class 

-A-DIRECT, Hard class 

“•“DIRECTS, Simple class 

DIRECTS, Hard class 


Figure 4: Operating characteristics of the methods New from [71], DIRECT, 
and DIRECT/on the ‘simple’ and ‘hard’ four-dimensional GKLS test classes 


increasing values of k) corresponds to a point on the plane. 

Eor example, Eigure 4 illustrates the operating characteristics for the new 
method and the DIRECT and DIRECT/ methods while solving M = 100 
four-dimensional functions of both ‘simple’ and ‘hard’ GKLS classes. The 
transition from the ‘simple’ class to the ‘hard’ one can be clearly traced in 
the diagram: e.g., the new method has solved 50 problems from the ‘simple’ 
class after 4098 trials while the solution of 50 problems from the ‘hard’ class 
has required 15064 trials (see the intersection of the horizontal line P{k) = 50 
with graphs labelled ‘New (Simple)’ and ‘New (Hard)’ in Figure 4). 

By examining the operating characteristics in Figure 4, it can be seen that 
for functions with simple structure (up to 50 functions of the ‘simple’ class 
and up to 40 functions of the ‘hard’ class) all three methods behave similarly. 
But from the global optimization viewpoint such functions are not interest¬ 
ing because it is possible to successfully minimize them even by very naive 
methods. The situation is changed when problems with complex structure 
should be solved: here, both the DIRECT and DIRECT/ methods experi¬ 
ence serious difficulties. For example, on the ‘hard’ class the new method has 
solved all 100 problems after about 43000 trials while the DIRECT method 
has solved 75 problems after the same number of trials and the DIRECT/ 
method — only 55 problems. Note also that even after 80000 trials neither 
DIRECT nor DIRECT/ methods were able to solve all the problems of both 
the ‘simple’ and ‘hard’ classes of dimension iV = 4 (see the most right vertical 
line in Eigure 4). A similar situation occurs for the operating characteristics 
on classes of dimensions N = 2, 3, and 5, thus, confirming the efficiency of 
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the proposed method compared with the DIRECT and DIRECT? methods 
when solving multidimensional multiextremal problems. 

This method, as an example of the deterministic techniques mentioned 
in the previous Sections, not only has manifested a high performance on a 
large set of tests, but has been also successfully applied for solving real-world 
global optimization problems. For example, its application to a control the¬ 
ory problem has been considered in [39]. This problem regards global tuning 
of fuzzy power system stabilizers present in a multi-machine power system 
in order to damp the power system oscillations. Power system stabilizers 
with conventional industry structure are extensively used in modern power 
systems as an efficient means of damping power. Traditionally their param¬ 
eters are determined by a local tuning procedure based on a single-machine 
infinite-bus system in which the effects of inter-machine and inter-area dy¬ 
namics are usually ignored. Heuristic methods (like genetic algorithms) are 
usually used for their optimizing (as in many other engineering contexts) 
that often leads to very rough solutions (see, e.g., the references in [39]). 
To improve overall system dynamic performance, novel global optimization 
techniques have been therefore applied bv the authors’ group in [391. 



Figure 5: Solutions to the problem of global tuning fuzzy power system 
stabilizers (see [39]) obtained by applying the method [71] based on the 
authors’ techniques (solid line) and by a traditionally used genetic approach 
(dotted line). 


In Figure 5, the graph that illustrates the best solution (the axis of or¬ 
dinates) obtained by a particular genetic algorithm (often used by engineers 
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from the control field) and by the method [71] after a number of simulations 
(the axis of abscissas) is reported. It can be seen that the global optimiza¬ 
tion method proposed by the authors spent more function trials (namely, 
284 trials) than the genetic algorithm at the initial iterations. This phase 
corresponds to the initial exploration of the search domain and it is nec¬ 
essary for all global optimization techniques. On the initial phase of the 
work (less than 300 trials) the genetic algorithm has found local solutions 
to the problem better than those found by the method [71], but far from 
the final global solution (/* ~ 0.533). However, it is more important and 
should be underlined that the method [71] has determined a solution to the 
problem very close to the global optimal one (as demonstrated in [39]) in 
almost half of the simulations with respect to the genetic algorithm (284 tri¬ 
als for the method [71] and 500 for the genetic algorithm). Moreover, it has 
found an attraction region of a new minimizer with a much better solution 
to the problem (see the graph jump in Figure 5 around 450 trials) than that 
found by the genetic approach. Thus, when a reasonable limit of function 
trials is given, the considered method [71] can determine a good estimate 
of the global solution to the studied control theory problem faster than the 
traditionally used genetic techniques. 

Therefore, global optimization techniques briefly presented in this sur¬ 
vey can provide the scientists and engineers with comprehensive and powerful 
tools for successful solving challenging decision-making problems from dif¬ 
ferent real-life application areas, which are characterized by black-box mul¬ 
tiextremal and hard to evaluate functions. A more detailed and systematic 
comparison of the described deterministic approaches with some heuristic 
nature inspired techniques widely used in engineering applications could be 
an interesting and useful direction of future research. 
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